Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45906: Create app that can use Butler to implement IVOA SIAv2 protocol #3

Merged
merged 9 commits into from
Oct 18, 2024

Conversation

stvoutsin
Copy link
Member

@stvoutsin stvoutsin commented Sep 23, 2024

Summary

PR introduces the SIA v2 (over Butler) FastAPI application. The app utilizes the dax_obscore & daf_butler packages to search for images that match certain query parameters. The API follows the IVOA specification: https://www.ivoa.net/documents/SIA/

A few things to note:

  • Current version doesn't yet support querying for multiple different collections, stored in separate Butler repos. This is pending discussion with Gregory & Tim to decide best way to structure obscore configs, repos and the sia app and how the query will interact with these. Potentially we'll have to run a query per butler repo available and combine the results, assuming the result schema is consistent between them..
  • Query endpoint is not async. This is because the underlying siav2_query from the obscore package is also not async. If we want to make it async, we probably need some sort of worker system in place.
  • Current version is limited to returning VOTable as the response
  • Exceptions in query are intercepted and returned as a VOTable error response.
  • There is a temporary workaround for Butler to define a default instrument. This should be addressed at some point in daf_butler, after which we can remove the workaround.
  • Requirements: The current version has requirements that were generated without hashes, because there isn't a tagged or pip release for dax_obscore with the changes needed.

Design decisions

Current system was designed to support Direct & Remote butler configurations. There is a factory method create_butler which should create the butler based on the configuration passed in. Initially the designed was more abstract, where I defined the notion of a QueryEngine, and DirectButlerEngine, RemoteButlerEngine were implementations, however this felt too abstract and an early optimization for something that may not be needed, i.e. other implementations of siav2_query.

The main configuration other than the butler_type needed is butler_data_collections. This expects a list of ButlerDataCollection objects, which has following parameters, all of which are used to generate the right Butler object. Reason for butler_data_collections being a list, is that we may in the future want to support multiple different Butler repos.

config: The obscore yaml configuration object for the repo 
repository: Path to the repository (May be a URL or a local path)
label: Label for the repo (e.g. "dp02")
defaultinstrument: Default instrument (e.g. "LSSTCam-imSim"")
default:  False Whether this should be the default collection
datalinkurl: If we want to overwrite the datalink_url_fmt attribute in the dax_obscore config.

The query parameters are defined as a dataclass, because it seemed difficult to get a Pydantic model working with list query parameters (related issue: fastapi/fastapi#10556). Also siav2_query has it's own Pydantic model for the SIAv2 Parameters with it's own validation, it seemed better to avoid duplicating the validation.

Prototype notebook

Example notebook demonstrating access via python can be found here:
https://github.com/lsst-sqre/sia/blob/u/stvoutsin/example-notebook/examples/SIAv2_Example.ipynb

How was this tested

A test instance was deployed on data-dev. Process was to modify science-platform app to point to phalanx where sia is enabled, sync only sia and then point the synced sia app to the GH dev branch.

Sample query:
https://data-dev.lsst.cloud/api/sia/query?POS=CIRCLE+55.7467+-32.2862+0.05&time=60550.31803461111+60550.31838182871

Validation was run (using a sample HSC direct butler setup) with the following services and passed with an A++ rating
Euro-vo validation
Paris-vo validation

What hasn't been tested
Apart from the above query, I haven't tested many other parameters, as I'm not really sure about the holdings and what type of queries should return data.

I also haven't fully concluded any concurrent access tests (Ongoing work).

@stvoutsin stvoutsin force-pushed the tickets/DM-45906 branch 2 times, most recently from afdf067 to a187db8 Compare October 8, 2024 16:45
@stvoutsin stvoutsin marked this pull request as ready for review October 8, 2024 18:16
@stvoutsin stvoutsin force-pushed the tickets/DM-45906 branch 4 times, most recently from 7327e38 to abc1abc Compare October 8, 2024 22:15
Copy link
Member

@rra rra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an absolute ton of work and overall looks great! Most of the comments are small things. I just have a few larger structural things.

I vote for dropping local Butler support. Our position has been that Science Platform services will only support remote Butler, and I've dropped local Butler support from vo-cutouts and datalinker already. I'm worried that someone will use local Butler support and then be upset when we delete it later.

Updating the Makefile to the latest from the FastAPI template, that will decide whether to generate hashes per dependency and you can stop thinking about that.

requirements/main.in Outdated Show resolved Hide resolved
src/sia/data/dp02.yaml Outdated Show resolved Hide resolved
src/sia/dependencies/labeled_butler_factory.py Outdated Show resolved Hide resolved
src/sia/dependencies/query_params.py Outdated Show resolved Hide resolved
src/sia/dependencies/token.py Show resolved Hide resolved
src/sia/services/data_collections.py Outdated Show resolved Hide resolved
src/sia/services/data_collections.py Outdated Show resolved Hide resolved
src/sia/services/response_handler.py Outdated Show resolved Hide resolved
src/sia/services/response_handler.py Outdated Show resolved Hide resolved
src/sia/timer.py Outdated Show resolved Hide resolved
@stvoutsin
Copy link
Member Author

stvoutsin commented Oct 12, 2024

I've applied hopefully all the suggested changes. There are a few more changes (sorry about the quite long commit) which are related to:

  • Target is now one SIAv2 service per Butler repo. This current implementation is designed to support a single Phalanx SIA app which can support multiple different SIAv2 IVOA services, by using a collection URL endpoint param to separate between them:
    i.e. /api/sia/dp02/query
    Let me know if this is problematic in any way. I guess the alternative is one app per data-release (dr1, dr2, prompt products, others? etc..)
  • I've also tried to make the app as agnostic as possible to the DIRECT vs REMOTE butler type, as this is now an attribute of the ButlerDataCollection class passed in as a configuration. The LabeledButlerFactory can also create direct Butler objects, so a lot of the complexity should be now removed. This way hopefully a single SIA can also support multiple Butler repos of different type.

@stvoutsin
Copy link
Member Author

This is an absolute ton of work and overall looks great! Most of the comments are small things. I just have a few larger structural things.

I vote for dropping local Butler support. Our position has been that Science Platform services will only support remote Butler, and I've dropped local Butler support from vo-cutouts and datalinker already. I'm worried that someone will use local Butler support and then be upset when we delete it later.

Updating the Makefile to the latest from the FastAPI template, that will decide whether to generate hashes per dependency and you can stop thinking about that.

I've left the requirements as non-hashed because dax_obscore isn't on pypi yet. Hopefully it is soon and I can update the requirements to included the hashes, do we want to wait for that before merging? Is there a better way I should handle this?

@stvoutsin
Copy link
Member Author

Note that I've in parallel made some changes that I'll wait for PR after this is finished, which introduces a self-description VOTable document & also adds a dependency which loads the obscore configs during the app initialization, so that there are no calls to the external config file during the query. I didn't want to include them in this PR as to not introduce additional changes on top of the recent PR fixes.

@rra
Copy link
Member

rra commented Oct 16, 2024

You should be able to use the latest Makefile from templates and just run make update-deps. uv will hash the dependencies that it can and not hash the dependencies that it can't. With the latest uv, one should no longer need to worry about this problem of whether to hash or not hash, since it should just figure it out.

Copy link
Member

@rra rra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new middleware looks great! I am really excited if this works, since this will solve a bunch of problems for IVOA protocol apps.

This all looks great. I left a few more comments on the unresolved conversations, but overall I'm great with merging this all. This looks very solid.

@stvoutsin
Copy link
Member Author

The new middleware looks great! I am really excited if this works, since this will solve a bunch of problems for IVOA protocol apps.

This all looks great. I left a few more comments on the unresolved conversations, but overall I'm great with merging this all. This looks very solid.

Yep that worked, just had to modify the Dockerfile to get it to build without complaining about not having hashes for that github repo

@stvoutsin stvoutsin merged commit 2d30521 into main Oct 18, 2024
3 checks passed
@stvoutsin stvoutsin deleted the tickets/DM-45906 branch October 21, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants